When the course started, we were asked to choose a corpus. Important was to find something that allowed for meaningful comparisons and contrasts, so we could answer a specific research question. This gave me an interesting idea: since 2014, I have been keeping track of all the songs that I have listened to, using a website called Last FM. Every time a track is played, on media players like Spotify or iTunes for example, a “scrobble” is recorded. This way, I have scrobbled a total of 121,587 tracks (and counting!). What better corpus to choose than a corpus that contains a large part of all the music you have ever listened to? Although at the start of this course I had never worked with an API, and had just learned intermediate skills in R in my 3d year of Psychology, it sounded like an interesting challenge. So, I started googling.
Very soon after I found out that this might become a daunting task. Collecting all my scrobbles from the Last FM API wasn’t the hard part; combining over 100,000 songs with Spotify features however, that was something I was not capable of. Luckily, I found a guide written by Andrew Walker, a researcher from the University of Florida, that included detailed instructions on how to do exactly this. Fetching the features would take the longest of the code, he said, likely up to 10-15 minutes. Obviously, for a dataset as large as mine, that was a gross underestimation. When I got the code working, I cut up the fetching process into two parts, my dataset into 5 parts, and let it all run sequentially. 6 hours of long waiting later, it was finally there: all my scrobbles and corresponding Spotify features! From this point on I knew that analyzing my corpus could lead to some very interesting results.
In this portfolio, I will try to answer one main research question: How does time influence my music listening? To answer this question, I will look at three different modes of time: 1. Hour of the day 2. Month of the year 3. Year of my life (also known as age, perhaps)
First, I will provide a visual overview of my data. Here you can find for each year all sorts of interesting descriptive statistics: how much music I’ve listened to, how my Spotify features have developed over time, and more. The most important explanatory variable being of course, time.
Then, I will conduct more detailed analyses. Certain interesting patterns emerged from my preliminary analyses, how can they be explained? Can I find more information about them in chordograms, keygrams, self-similarity matrices?
In 2015, I was still in high school, and I was listening to a lot of Mac DeMarco. I put on his music and listened all his albums through, pretty much on repeat. I have never listened to an artist so much again, which is why the music I listened to in 2014 and 2015 is still at the top of my most played. Since then, I have started listening to a lot more different music, which can also be seen from my album plays in the gauges.
At the start of 2016, I was in the middle of my gap year. I was spending a lot of time playing guitar and listening to music, but I was still in the early phases of discovery. I took all this into my first year of studying, where I was still listening to a lot of the music I found in the year before.
For me, the year 2017 got off on a strange start. I had quit studying philosophy, I was living in Amsterdam, but I had no idea what direction my life was going in. This was the point that I felt that I needed to make some bigger steps. You can see this very clearly in my music listening: the amount of different albums I had listened to has nearly tripled! It will be interesting to see if we can also find some trends in the Spotify features from this year forward.
2018 marked the start of something new. I started studying Psychology, and I was beginning to listen to music on a whole new level. Since I had to spend hours studying in the library, I started listening to different music as well: Boards of Canada was one of my go-to artists for studying, and has slowly become one of my favorite artists.
It was 2019, and things started gaining traction. I was discovering more of my would-be favorite artists: I listened to Yo La Tengo and Boards of Canada before, and came across all sorts of different nineties bands I just couldn’t seem to get around. Suddenly I was finding all sorts of electronic music I liked, which can be seen in the genre chart.
Over the year I have listened to a very large amount of music. I know my music listening habits very well, can we also see this reflected in the data?
Now, let’s take a look at the Spotify features per hour. This way we can see if patterns emerge, and which features might be worth looking into. The most noteworthy pattern is the valence: it dips in the night, and goes up during the day. This seems to be inversely related to the instrumentalness. We will need to see if this is the case for the other years as well to draw any conclusions about this, however. Other features change as well during the time of day, but at least in 2015 this difference does not seem big enough to be significant.
In my portfolio, I want to see how my music listening has changed over the years. To analyse this, I have a corpus that consists of the songs I have listened to since 2014. One thing that might have changed is the key of the songs that I have listened to. To analyse this, I made a histogram of all the keys of the songs in 2015, my final year of high school, and 2019, when I was halfway in my second year of Psychology.
In the histogram, you can see for every key what its proportion is to all of the keys of the songs that I listened to in 2015 and 2019. It seems that D is the most popular, and D# the least. There are slight differences in key between the years, but most noticably, it seems I am listening to far fewer songs in the key of A. Why is this?
To figure this out, I made a table of the artists I listened to in each year that wrote songs in A, and looked at the artists with the highest frequency. Not to my surprise, most songs in A were written by artist like Mac DeMarco, Beach House, Grizzly Bear, and The Black Keys, which are all alternative/ indie artists using guitars. The A chord is popular in songs written on guitar, since it can be played as an open chord, and it goes well with many other open chords. Since 2015, I have started listening to a lot less guitar-centered music, which might explain why I am also listening to fewer songs in the key of A.
For this plot, I used my top 15 songs from 2015 and 2019. Since I had to use the data I fetched from LastFM, I had a hard time getting the data right so it would work with the compmus package. I managed to get it right however, and the resulting plot is quite interesting.
Here you can see the mean tempo plotted against the SD of tempo, colour indicating tempo, size indicating song duration, and opacity indicating loudness. There seem to be quite large differences between 2015 and 2019. First of all, the range of tempo is much larger in 2019: it spans from ~70 to ~160, while in 2015 the tempo is clustered around 100. This indicates that in 2019, my music taste has become more varied. This can also be seen in song duration and loudness: in 2015 they seem to be similar, while in 2019 it seems to vary more.
Interestingly, the standard deviation of tempo seems to increase with tempo. Does this mean that higher tempo songs also have a higher deviation in tempo? I have no clue.